Enable HPU Fused SDPA for Qwen3-VL vision attention using attention masks #787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

iboiko-habana merged 7 commits into vllm-project:main from slokesha:slokesha/qwen3_enable_mask

Jan 19, 2026

Contributor

slokesha commented Jan 7, 2026 •

edited

Loading

Qwen3-VL vision attention is updated to use FusedSDPA.apply directly when the query sequence length is within the supported fused range (q_len ≤ 65536).
This removes the per-block Q/K/V attention loop and enables the optimized HPU fused SDPA kernel for vision attention.

The change aligns Qwen3-VL with the optimized path already used by Qwen2.5-VL on Gaudi, improving efficiency while preserving identical model outputs.

github-actions bot commented Jan 7, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.


          Updated qwen3 to use HPUAttention

e07204d

Signed-off-by: slokesha <slokeshappa@habana.ai>

slokesha force-pushed the slokesha/qwen3_enable_mask branch from c245eb8 to e07204d Compare

January 7, 2026 21:07

github-actions bot commented Jan 7, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

slokesha marked this pull request as ready for review

January 7, 2026 22:00

slokesha requested review from adobrzyn, afierka-intel, iboiko-habana, kamil-kaczor, ksmusz, kzawora-intel, mgawarkiewicz-intel, michalkuligowski and xuechendi as code owners

January 7, 2026 22:00

slokesha marked this pull request as draft

January 7, 2026 22:00

github-actions bot commented Jan 7, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.


          Merge branch 'main' into slokesha/qwen3_enable_mask

47143b8

Signed-off-by: Spurthi Lokeshappa <slokeshappa@habana.ai>

slokesha marked this pull request as ready for review

January 7, 2026 22:00

github-actions bot commented Jan 7, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

This is a Draft PR. Please mark it as 'Ready for Review' to trigger the CI.

slokesha changed the title ~~Use FuseSDPA for Qwen3_VL~~ Enable HPU Fused SDPA for Qwen3-VL vision attention using attention masks

github-actions bot mentioned this pull request

🚦 Team Review Dashboard #701

Open

slokesha and others added 2 commits

January 8, 2026 09:03


          Merge branch 'main' into slokesha/qwen3_enable_mask

1dca268


          renamed HPUQwen3_VisionTransformer class

8a28426

Signed-off-by: slokesha <spurthi.lokeshappa@intel.com>

github-actions bot commented Jan 12, 2026

🚧 CI Blocked

The main CI workflow was not started for the following reason:

Your branch is behind the base branch. Please merge or rebase to get the latest changes.

slokesha added 3 commits

January 12, 2026 14:50


          Merge branch 'main' into slokesha/qwen3_enable_mask

e304616


          Merge branch 'main' into slokesha/qwen3_enable_mask

822ad38


          Merge branch 'main' into slokesha/qwen3_enable_mask

135bb54

slokesha mentioned this pull request

Slokesha/enable qwen3 #828

Draft

iboiko-habana approved these changes

View reviewed changes

github-actions bot commented Jan 17, 2026

✅ CI Passed

All checks passed successfully against the following vllm commit:
6218034dd7f9a56596e4fd8c8c8fc1d8011ed9c2

iboiko-habana merged commit 7011e31 into vllm-project:main

52 checks passed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

iboiko-habana iboiko-habana approved these changes

kzawora-intel Awaiting requested review from kzawora-intel kzawora-intel is a code owner

xuechendi Awaiting requested review from xuechendi xuechendi is a code owner

adobrzyn Awaiting requested review from adobrzyn adobrzyn is a code owner

mgawarkiewicz-intel Awaiting requested review from mgawarkiewicz-intel mgawarkiewicz-intel is a code owner

afierka-intel Awaiting requested review from afierka-intel afierka-intel is a code owner

michalkuligowski Awaiting requested review from michalkuligowski michalkuligowski is a code owner

kamil-kaczor Awaiting requested review from kamil-kaczor kamil-kaczor is a code owner

ksmusz Awaiting requested review from ksmusz ksmusz is a code owner

Labels

None yet